5 research outputs found

    Circuit Techniques for Adaptive and Reliable High Performance Computing.

    Full text link
    Increasing power density with process scaling has caused stagnation in the clock speed of modern microprocessors. Accordingly, designers have adopted message passing and shared memory based multicore architectures in order to keep up with the rapidly rising demand for computing throughput. At the same time, applications are not entirely parallel and improving single-thread performance continues to remain critical. Additionally, reliability is also worsening with process scaling, and margining for failures due to process and environmental variations in modern technologies consumes an increasingly large portion of the power/performance envelope. In the wake of multicore computing, reliability of signal synchronization between the cores is also becoming increasingly critical. This forces designers to search for alternate efficient methods to improve compute performance while addressing reliability. Accordingly, this dissertation presents innovative circuit and architectural techniques for variation-tolerance, performance and reliability targeted at datapath logic, signal synchronization and memories. Firstly, a domino logic based design style for datapath logic is presented that uses Adaptive Robustness Tuning (ART) in addition to timing speculation to provide up to 71% performance gains over conventional domino logic in 32bx32b multiplier in 65nm CMOS. Margins are reduced until functionality errors are detected, that are used to guide the tuning. Secondly, for signal synchronization across clock domains, a new class of dynamic logic based synchronizers with single-cycle synchronization latency is presented, where pulses, rather than stable intermediate voltages cause metastability. Such pulses are amplified using skewed inverters to improve mean time between failures by ~1e6x over jamb latches and double flip-flops at 2GHz in 65nm CMOS. Thirdly, a reconfigurable sensing scheme for 6T SRAMs is presented that employs auto-zero calibration and pre-amplification to improve sensing reliability (by up to 1.2 standard deviations of NMOS threshold voltage in 28nm CMOS); this increased reliability is in turn traded for ~42% sensing speedup. Finally, a main memory architecture design methodology to address reliability and power in the context of Exascale computing systems is presented. Based on 3D-stacked DRAMs, the methodology co-optimizes DRAM access energy, refresh power and the increased cost of error resilience, to meet stringent power and reliability constraints.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107238/1/bharan_1.pd

    Adaptive Robustness Tuning for High Performance Domino Logic

    No full text
    Abstract A new domino design style is proposed that provides performance gains of up to 71% over conventional domino, and is demonstrated in a 32b multiplier in 65nm CMOS. The design dynamically tunes domino gates to trade surplus noise margins at nominal conditions for performance by detecting stability errors during runtime while guaranteeing correct operation. Introduction While many chips are constrained by power, speed critical circuit portions in a design continue to benefit from targeted use of a high performance logic design style Proposed Approach ART Domino performs two evaluations of the domino gate ( The ART Domino gate operates in four phases: (a) Speculative Precharge (SP), where the gate is precharged with margins removed; (b) Speculative Evaluate (SE), where the gate performs a fast, speculative evaluation; (c) Checker Precharge (CP), where the gate is precharged with restored margins; (d) Checker Evaluate (CE), where the gate performs a slower "always correct" evaluation. During the Speculate (SPEC) phase, precharge voltage V X is lowered to TVDD and voltage V Y on the output inverter is raised to TVSS speeding critical transitions at both nodes by reducing voltage swings. Raising V Y also speeds the following gate by trading its noise margin for speed. During the Check (CHECK) phase, robustness margins are restored and a safe evaluation checks for errors. The values of TVDD and TVSS are tuned to operate the design at the edge of failure, thereby maximizing performance gains and automatically tracking PVT conditions. Both halves of each pipe stage perform safe evaluations simultaneously. The second half passes its result via phase overlap to the next stage, which then in turn performs simultaneous safe evaluations on its two halves. Fully-margined gates are used in the error logic for "always correct" operation. The speculated and checked results of the segment are copied to domino latches and the two values are compared in the following SPEC cycle. As in all design styles incorporating timing speculation, metastability can occur in ART Domino design on the error signal and cause error detection failures. We propose solutions to suppress the two sources of metastability in the latch DOMBUF: 1) Metastability due to genuine timing violations during SE is minimized by providing an additional half cycle of slack for the latch to evaluate. 2) Unintentional leakage in the preceding gate (Gate marked X i
    corecore